Back

BMC Bioinformatics

Springer Science and Business Media LLC

Preprints posted in the last 7 days, ranked by how well they match BMC Bioinformatics's content profile, based on 383 papers previously published here. The average preprint has a 0.37% match score for this journal, so anything above that is already an above-average fit.

1
Decoding resistance: interpretable machine learning to predict ciprofloxacin resistance in Shigella spp

Gohari, M. R.; Zhang, P.; Villegas, A.; Rosella, L. C.; Patel, S. N.; Hopkins, J. P.; Duvvuri, V. R.

2026-04-11 infectious diseases 10.64898/2026.04.07.26350353 medRxiv
Top 4%
1.7%
Show abstract

Antimicrobial resistance (AMR) is a growing global public health threat that complicates the treatment and control of bacterial infections. Shigella spp., a leading cause of bacterial diarrhea worldwide, has increasingly exhibited resistance to multiple antimicrobial agents that are commonly recommended therapy for severe shigellosis. Although conventional antimicrobial susceptibility testing (AST) remains the reference standard, it is time-consuming and provides limited insight into the genetic mechanisms underlying resistance. Whole-genome sequencing (WGS) has emerged as a complementary approach for AMR detection by enabling direct identification of resistance genetic determinants encoded in bacterial genomes. Machine learning (ML) methods applied to genomic features such as k-mers have shown promise for predicting resistance phenotypes from WGS data; however, applications to Shigella remain limited. In this study, we developed and evaluated an interpretable ML framework for predicting ciprofloxacin resistance using k-mer features derived from WGS data of 1,424 Shigella isolates collected in Ontario, Canada, between 2018 and 2025. K-mers were extracted from known gene targets associated with ciprofloxacin resistance, including chromosomal quinoline resistance-determining regions (QRDRs: gyrA and parC) and plasmid-mediated determinants (qnr). Supervised ML approaches were trained and compared. We evaluated the influence of k-mer lengths (k=11, 15, 21 and 31) on predictive performance and model interpretability; and compared models based on chromosomal determinants alone and models incorporating both chromosomal and plasmid-mediated determinants. Randon Forest classifier achieved the most consistent performance across models. Inclusion of plasmid-mediated determinants improved predictive accuracy relative to chromosomal-only models. Although differences across k-mer lengths were modest, k = 11 produced the highest area under the receiver operating characteristic curve (AUC) and the lowest Brier score. SHAP analyses localized high-impact features within QRDRs of gyrA and parC, supporting biological interpretability. These findings demonstrate that biologically-informed k-mer-based ML models can accurately and transparently predict ciprofloxacin resistance in Shigella, supporting their potential integration into genomic AMR surveillance and digital public health frameworks. Author summaryIn this study, we used genome sequencing data to develop machine learning models that predict ciprofloxacin resistance for Shigella directly from bacterial DNA. We focused on small DNA fragments (k-mers) derived from known resistance genes and mutations. Among the approaches tested, a Random Forest model showed the most consistent performance. Combining chromosomal mutations with plasmid-mediated resistance genes improved prediction accuracy and helped identify key genetic regions associated with resistance. These findings demonstrate that machine learning applied to genomic data can accurately and interpretable predict antibiotic resistance, supporting its potential use in genomic surveillance and public health monitoring.

2
A standardized non-linear approach to studying menstrual cycle effects on brain and behavior

Perovic, M.; Mack, M. L.

2026-04-12 sexual and reproductive health 10.64898/2026.04.10.26350619 medRxiv
Top 4%
1.7%
Show abstract

Menstrual cycles are major biological events with extensive effects on the brain and cognition, experienced by half of the human population. To develop a comprehensive account of human cognition, it is necessary to successfully integrate and characterize menstrual cycle effects in cognitive science research. However, current approaches to menstrual cycle analysis suffer from low data resolution and are not well-equipped to capture the highly variable, non-linear changes in outcomes of interest across the cycle. We present a validated standardized method remedying these issues, demonstrate its utility using hormonal, behavioral, and neuroimaging data, and provide an open-source toolkit to facilitate its use.

3
Fine-Tuning PubMedBERT for Hierarchical Condition Category Classification

Wang, X.; Hammarlund, N.; Prosperi, M.; Zhu, Y.; Revere, L.

2026-04-15 health systems and quality improvement 10.64898/2026.04.13.26350814 medRxiv
Top 6%
1.0%
Show abstract

Automating Hierarchical Condition Category (HCC) assignment directly from unstructured electronic health record (EHR) notes remains an important but understudied problem in clinical informatics. We present HCC-Coder, an end to end NLP system that maps narrative documentation to 115 Centers for Medicare & Medicaid Services(CMS) HCC codes in a multi-label setting. On the test dataset, HCC-Coder achieves a macro-F1 of 0.779 and a micro-F1 of 0.756, with a macro-sensitivity of 0.819 and macro-specificity of 0.998. By contrast, Generative Pre-trained Transformer (GPT)-4o achieves highest score of a macro-F1 of 0.735 and a micro-F1 of 0.708 under five-shot prompting. The fine-tuned model demonstrates consistent absolute improvements of 4%-5% in F1-scores over GPT-4o. To address severe label imbalance, we incorporate inverse-frequency weighting and per-label threshold calibration. These findings suggest that domain-adapted transformers provide more balanced and reliable performance than prompt-based large language models for hierarchical clinical coding and risk adjustment.

4
SPLIT: Safety Prioritization for Long COVID Drug Repurposing via a Causal Integrated Targeting Framework

Pinero, S. L.; Li, X.; Lee, S. H.; Liu, L.; Li, J.; Le, T. D.

2026-04-16 health informatics 10.64898/2026.04.12.26350701 medRxiv
Top 6%
0.9%
Show abstract

Long COVID affects millions of people worldwide, yet no disease-modifying treatment has been approved, and existing interventions have shown only modest and inconsistent benefits. A key reason for this limited progress is that current computational drug repurposing pipelines do not match well with the clinical reality of Long COVID. These patients often have persistent, multisystemic symptoms and may already be taking multiple medications, making treatment safety a primary concern. However, most repurposing workflows still treat safety as a downstream filter and rely on disease-associated targets rather than causal drivers. They also assume that the findings of one analysis would generalize across the diverse presentations of Long COVID. We introduce SPLIT, a safety-first repurposing framework that addresses these limitations. SPLIT prioritizes safety at the start of the candidate evaluation, integrates complementary causal inference strategies to identify likely driver genes, and uses a counterfactual substitution design to compare drugs within specific cohort contexts. When applied to cognitive and respiratory Long COVID cohorts, SPLIT revealed three main findings. First, drugs with similar predicted efficacy could have very different predicted safety profiles. Second, the drugs flagged as unfavorable were often different between the two cohorts, showing that drug prioritization is phenotype-specific. Third, SPLIT flagged 18 drugs currently under active investigation in Long COVID trials as having unfavorable predicted profiles. SPLIT provides a practical framework to identify safer, more context-appropriate candidates earlier in the process, supporting more targeted and better-tolerated treatment strategies for Long COVID.

5
Democratizing Scientific Publishing: A Local, Multi-Agent LLM Framework for Objective Manuscript Editing

Bhansali, R.; Gorenshtein, A.; Westover, B.; Goldenholz, D. M.

2026-04-17 health informatics 10.64898/2026.04.13.26350761 medRxiv
Top 7%
0.8%
Show abstract

Manuscript preparation is a critical bottleneck in scientific publishing, yet existing AI writing tools require cloud transmission of sensitive content, creating data-confidentiality barriers for clinical researchers. We introduce the Paper Analysis Tool (PAT), a free, multi-agent framework that deploys 31 specialized agents powered by small language models (SLMs) to audit manuscripts across multiple quality dimensions without external data transmission. Applied to three published clinical neurological papers, PAT generated 540 evaluable suggestions. Validation by two expert reviewers (R.B., A.G.) confirmed 391 actionable, high-value revisions (90% agreement), achieving a 72.4% overall usefulness accuracy spanning methodological, statistical, and visual domains. Furthermore, deterministic re-evaluation of 126 agent-suggested rewrite pairs using Phase 0 metrics confirmed text improvement: total word count decreased by 25%, passive voice prevalence dropped sharply from 35% to 5%, average sentence length decreased by 24%, long-sentence fraction fell by 67%, and the Flesch-Kincaid grade improved by 17% . Our validation confirms that systematic, agent-driven pre-submission review drives measurable improvements, successfully converting manuscript optimization from an opaque, manual endeavor into a transparent and rigorous scientific process. Manuscript preparation is a critical bottleneck in scientific publishing, yet existing AI writing tools require cloud transmission of sensitive content, creating data-confidentiality barriers for clinical researchers. We introduce the Paper Analysis Tool (PAT), a free, multi-agent framework that deploys 31 specialized agents powered by small language models (SLMs) to audit manuscripts across multiple quality dimensions without external data transmission. Applied to three published clinical neurological papers, PAT generated 540 evaluable suggestions. Independent validation by two expert reviewers (R.B., A.G.) confirmed 391 actionable, high-value revisions (90% agreement), achieving a 72.4% overall usefulness accuracy spanning methodological, statistical, and visual domains. Furthermore, deterministic re-evaluation of 126 suggested Phase 0 rewrite pairs confirmed text improvement: total word count decreased by 25%, passive voice prevalence dropped sharply from 35% to 5%, average sentence length decreased by 24%, and long-sentence fraction fell by 67%, and the Flesch-Kincaid grade improved modestly. Our validation confirms that systematic, agent-driven pre-submission review drives measurable improvements, successfully converting manuscript optimization from an opaque, manual endeavor into a transparent and rigorous scientific process.

6
Can NLP Detect Loneliness in Electronic Health Records? A Proof-of-Concept Study

Park, T.; Habibi, S.; Lowers, J.; Sarker, A.; Bozkurt, S.

2026-04-11 health informatics 10.64898/2026.04.08.26350462 medRxiv
Top 7%
0.7%
Show abstract

Loneliness is clinically important but under-documented in electronic health records (EHRs), posing challenges for secondary use and computational phenotyping. This study evaluated whether natural language processing (NLP) methods can detect and classify loneliness severity from clinical notes. Patients with a loneliness survey (mild, moderate, severe) were identified, and notes within six months prior to the survey were retrieved. An expert-expanded lexicon was applied, and transformer models (RoBERTa, ClinicalBERT, Longformer) were fine-tuned for loneliness severity classification. Large language model-based summarization of social and psychiatric history was also tested as an alternative input representation. Performance was evaluated using accuracy, weighted-F1, and per-class F1. All models achieved modest accuracy (0.3 to 0.7), and struggled to identify severe loneliness, reflecting sparse and inconsistent documentation even among surveyed patients. While summarization marginally improved accuracy, gains primarily reflected mild predictions. Manual review of 100 social worker notes from severely lonely patients found explicit mentions of loneliness in only two cases, confirming that relevant documentation is exceedingly rare. These findings demonstrate that model performance is constrained by the sparse and inconsistent documentation of loneliness in EHRs, rather than by deficiencies in the modeling approach itself.

7
JARVIS, should this study be selected for full-text screening? Performance of a Joint AI-ReViewer Interactive Screening tool for systematic reviews

Barreto, G. H. C.; Burke, C.; Davies, P.; Halicka, M.; Paterson, C.; Swinton, P.; Saunders, B.; Higgins, J. P. T.

2026-04-11 health informatics 10.64898/2026.04.08.26350384 medRxiv
Top 7%
0.7%
Show abstract

BackgroundSystematic reviews are essential for evidence-based decision making in health sciences but require substantial time and resource for manual processes, particularly title and abstract screening. Recent advances in machine learning and large language models (LLMs) have demonstrated promise in accelerating screening with high recall but are often limited by modest gains in efficiency, mostly due to the absence of a generalisable stopping criterion. Here, we introduce and report preliminary findings on the performance of a novel semi-automated active learning system, JARVIS, that integrates LLM-based reasoning using the PICOS framework, neural networks-based classification, and human decision-making to facilitate abstract screening. MethodsDatasets containing author-made inclusion and exclusion decisions from six published systematic reviews were used to pilot the semi-automated screening system. Model performance was evaluated across recall, specificity and area under the curve precision-recall (AUC-PR), using full-text inclusion as the ground truth. Estimated workload and financial savings were calculated by comparing total screening time and reviewer costs across manual and semi-automated scenarios. ResultsAcross the six review datasets, recall ranged between 98.2% and 100%, and specificity ranged between 97.9% and 99.2% at the defined stopping point. Across iterations, AUC-PR values ranged between 83.8% and 100%. Compared with human-only screening, JARVIS delivered workload savings between 71.0% and 93.6%. When a single reviewer read the excluded records, workload savings ranged between 35.6 % and 46.8%. ConclusionThe proposed semi-automated system substantially reduced reviewer workload while maintaining high recall, improving on previously reported approaches. Further validation in larger and more varied reviews, as well as prospective testing, is warranted.

8
SCOPE: Integrating Organoid Screening and Clinical Variables Through Machine Learning for Cancer Trial Outcome Prediction

Bouteiller, J.; Gryspeert, A.-R.; Caron, J.; Polit, L.; Altay, G.; Cabantous, M.; Pietrzak, R.; Graziosi, F.; Longarini, M.; Schutte, K.; Cartry, J.; Mathieu, J. R.; Bedja, S.; Boileve, A.; Ducreux, M.; Pages, D.-L.; Jaulin, F.; Ronteix, G.

2026-04-11 oncology 10.64898/2026.04.10.26350512 medRxiv
Top 7%
0.7%
Show abstract

Background: Predicting whether a treatment will demonstrate meaningful clinical benefit before committing to a large-scale trial remains a major unmet need in oncology. Patient-derived organoids (PDOs) recapitulate individual tumor drug sensitivity, but have not been used to forecast population-level trial outcomes. We developed SCOPE (Screening-to-Clinical Outcome Prediction Engine), a platform that integrates PDO drug screening with clinical prognostic modeling to predict arm-level median progression-free survival (mPFS) and objective response rate (ORR) without access to any trial outcome data. Patients and methods: SCOPE was trained on 54 treatment lines from patients with metastatic colorectal cancer (mCRC, n=15) and metastatic pancreatic ductal adenocarcinoma (mPDAC, n=39) with matched clinical data and PDO drug screening across 9 compounds. A Clinical Score module captures baseline prognosis; a Drug Screen Score module quantifies treatment-specific organoid sensitivity. To predict trial outcomes, synthetic patient profiles are generated from published eligibility criteria and matched to a biobank of 81 PDO lines. Predictions were externally validated against 32 arms from 23 published trials, treatment ranking was assessed across 8 head-to-head comparisons, and prospective applicability was tested for daraxonrasib (RMC-6236), a novel pan-RAS inhibitor in mPDAC. Results: Predicted mPFS strongly agreed with published outcomes (R2=0.85, MAE=0.82 months; Pearson r=0.92, P<0.001), approaching the empirical concordance between two independently measured clinical endpoints (ORR vs. mPFS, R2=0.87). ORR prediction was similarly robust (R2=0.71, MAE=7.3 percentage points). Integrating organoid and clinical data significantly outperformed either alone (P=0.001). SCOPE correctly identified the superior arm in 7 of 8 head-to-head comparisons (88%, P<0.05). Applied to daraxonrasib prior to phase 3 data availability, the platform predicted superiority over standard chemotherapy in KRAS-mutant mPDAC, consistent with emerging clinical data. Conclusion: By combining functional organoid drug screening with clinical modeling, SCOPE generates calibrated efficacy predictions for both established regimens and novel agents without prior clinical data. This approach could support clinical trial design, treatment arm selection, and go/no-go decisions, offering a new tool to improve the efficiency of gastrointestinal cancer drug development.

9
GRASP: Gene-relation adaptive soft prompt for scalable and generalizable gene network inference with large language models

Feng, Y.; Deng, K.; Guan, Y.

2026-04-14 bioinformatics 10.1101/2025.10.20.683485 medRxiv
Top 8%
0.6%
Show abstract

Gene networks (GNs) encode diverse molecular relationships and are central to interpreting cellular function and disease. The heterogeneity of interaction types has led to computational methods specialized for particular network contexts. Large language models (LLMs) offer a unified, language-based formulation of GN inference by leveraging biological knowledge from large-scale text corpora, yet their effectiveness remains sensitive to prompt design. Here, we introduce Gene-Relation Adaptive Soft Prompt (GRASP), a parameter-efficient and trainable framework that conditions inference on each gene pair through only three virtual tokens. Using factorized gene-specific and relation-aware components, GRASP learns to map each pair's biological context into compact soft prompts that combine pair-specific signals with shared interaction patterns. Across diverse GN inference tasks, GRASP consistently outperforms alternative prompting strategies. It also shows a stronger ability to recover unannotated interactions from synthetic negative sets, suggesting its capacity to identify biologically meaningful relationships beyond existing databases. Together, these results establish GRASP as a scalable and generalizable prompting framework for LLM-based GN inference.

10
Cochrane Evaluation of (Semi-) Automated Review (CESAR) Methods: Protocol for an adaptive platform study within reviews

Gartlehner, G.; Banda, S.; Callaghan, M.; Chase, J.-A.; Dobrescu, A.; Eisele-Metzger, A.; Flemyng, E.; Gardner, S.; Griebler, U.; Helfer, B.; Jemiolo, P.; Macura, B.; Minx, J. C.; Noel-Storr, A.; Rajabzadeh Tahmasebi, N.; Sharifan, A.; Meerpohl, J.; Thomas, J.

2026-04-15 health informatics 10.64898/2026.04.13.26350802 medRxiv
Top 8%
0.6%
Show abstract

Background: Artificial intelligence (AI) has the potential to improve the efficiency of evidence synthesis and reduce human error. However, robust methods for evaluating rapidly evolving AI tools within the practical workflows of evidence synthesis remain underdeveloped. This protocol describes a study design for assessing the effectiveness, efficiency, and usability of AI tools in comparison to traditional human-only workflows in the context of Cochrane systematic reviews. Methods: Members of the Cochrane Evaluation of (Semi-) Automated Review (CESAR) Methods Project developed an adaptive platform study-within-a-review (SWAR) design, modeled after clinical platform trials. This design employs a master protocol to concurrently evaluate multiple AI tools (interventions) against a standard human-only process (control) across three key review tasks: title and abstract screening, full-text screening, and data extraction. The adaptive framework allows for the addition or removal of AI tools based on interim performance analyses without necessitating a restart of the study. Performance will be assessed using metrics such as accuracy (sensitivity, specificity, precision), efficiency (time on task), response stability, impact of errors, and usability, in alignment with Responsible use of AI in evidence SynthEsis (RAISE) principles. Results: The study will generate comparative data about the performance and usability of specific AI tools employed in a semi- or fully automated manner relative to standard human effort. The protocol provides a flexible framework for the assessment of AI tools in evidence synthesis, addressing the limitations of static, one-time evaluations. Discussion: This study protocol presents a novel methodological approach to addressing the challenges of evaluating AI tools for evidence syntheses. By validating entire workflows rather than individual technologies, the findings will establish an evidence base for determining the viability of integrating AI into evidence-synthesis workflows. The adaptive design of this study is flexible and can be adopted by other investigators, ensuring that the evaluation framework remains relevant as new tools emerge.

11
Adherence to International Pharmacogenomic Recommendations in Paediatric Cancer Care: A Cohort Analysis Embedded Within the MARVEL-PIC Randomised Trial

Chawla, A.; Carter, S.; Dyas, R.; Williams, E.; Moore, C.; Conyers, R.

2026-04-16 genetic and genomic medicine 10.64898/2026.04.15.26348678 medRxiv
Top 8%
0.5%
Show abstract

Background: Pharmacogenomic testing (PGx) can optimise drug efficacy and minimise toxicity, but the extent of prescriber adherence to PGx recommendations remains unclear. We aimed to quantify clinician adherence to international genotype-guided prescribing recommendations in a cohort of paediatric oncology patients. Methods: We reviewed files of children enrolled in the MARVEL-PIC (NCT05667766) randomised control trial, who had PGx recommendations available. Patients were included if 12 weeks had passed since their PGx report was released to clinicians. Prescribing events were identified for actionable PGx recommendations, and classified as "explicitly followed", "inadvertently followed", or "not followed". Adherence was assessed by patient, drug, and recommendation. Results: 2,063 PGx recommendations were available for 216 patients. 64 (3.1%) recommendations were actionable for 44 patients and 10 drugs within the 12-week study period. Recommendations were explicitly followed in 57/288 (19.8%) of prescribing events, inadvertently followed in 145 (50.3%), and not followed in 86 (29.9%). Mercaptopurine demonstrated the highest rate of explicit adherence (87.5%). No significant associations were observed between adherence and age group, cancer type, drug type, or strength of recommendation. Conclusion: Adherence to pharmacogenomic recommendations was very low, highlighting the need to understand barriers to PGx implementation, and consideration of clinical decision supports to facilitate adherence.

12
Attitudes and Perceptions of Generative Artificial Intelligence Chatbots in the Scientific Process of Traditional, Complementary, and Integrative Medicine Research: A Large-Scale, International Cross-Sectional Survey

Ng, J. Y.; Tan, J.; Syed, N.; Adapa, K.; Gupta, P. K.; Li, S.; Mehta, D.; Ring, M.; Shridhar, M.; Souza, J. P.; Yoshino, T.; Lee, M. S.; Cramer, H.

2026-04-15 health informatics 10.64898/2026.04.13.26350612 medRxiv
Top 8%
0.5%
Show abstract

Background: Generative artificial intelligence (GenAI) chatbots have shown utility in assisting with various research tasks. Traditional, complementary, and integrative medicine (TCIM) is a patient-centric approach that emphasizes holistic well-being. The integration of TCIM and GenAI presents numerous key opportunities. However, TCIM researchers' attitudes toward GenAI tools remain less understood. This large-scale, international cross-sectional survey aimed to elucidate the attitudes and perceptions of TCIM researchers regarding the use of GenAI chatbots in the scientific process. Methods: A search strategy in Ovid MEDLINE identified corresponding authors who were TCIM researchers. Eligible authors were invited to complete an anonymous online survey administered via SurveyMonkey. The survey included questions on socio-demographic characteristics, familiarity with GenAI chatbots, and perceived benefits and challenges of using GenAI chatbots. Results were analysed using descriptive statistics and thematic content analysis. Results: The survey received 716 responses. Most respondents reported familiarity with GenAI chatbots (58.08%) and viewed them as very important to the future of scientific research (54.37%). The most acknowledged benefits included workload reduction (74.07%) and increased efficiency in data analysis/experimentation (71.14%). The most frequently reported challenges involved bias, errors, and limitations. More than half of the respondents (57.02%) expressed a need for training to use GenAI chatbots in the scientific process, alongside an interest in receiving training (72.07%). However, 43.67% indicated that their institutions did not offer these programs. Discussion: By developing a deeper understanding of TCIM researchers' perspectives, future AI applications in this field can be more informed, and guide future policies and collaboration among researchers.

13
Dynamic and Baseline Multi-Task Learning for Predicting Substance Use Initiation in the ABCD Study

Wei, M.; Zhang, H.; Peng, Q.

2026-04-13 addiction medicine 10.64898/2026.04.10.26350655 medRxiv
Top 9%
0.5%
Show abstract

Background: Early initiation of substance use is linked to later adverse outcomes, and risk factors come from multiple domains and are shared across substances. In our previous work, traditional time-to-event Cox models identified individual risk factors, but these models are not designed to jointly model multiple outcomes or capture complex non-linear relationships. Multi-task learning (MTL) can leverage shared structure across related outcomes to improve prediction and distinguish common versus substance-specific predictors. However, most MTL studies rely on baseline features and focus on single outcomes, which limits their ability to capture shared risk and temporal changes. Substance use initiation is a time-dependent process that unfolds during development and reflects changing exposures over time. Baseline-only models cannot capture these changes or represent risk dynamics. Discrete-time modeling provides a practical approach by estimating interval-level initiation risk and combining it into cumulative risk at the subject level. By integrating multi-task learning with dynamic modeling, it is possible to share information across outcomes while capturing how risk evolves over time, which may improve prediction performance. Methods: Using the Adolescent Brain Cognitive Development (ABCD) Study (release 5.1), we developed two complementary multi-task learning (MTL) frameworks to predict initiation of alcohol, nicotine, cannabis, and any substance use. A baseline MTL model predicted fixed- horizon (48-month) initiation using one record per participant, while a dynamic discrete-time MTL model incorporated longitudinal interval data to model time-varying risk. Both models used multi-domain environmental exposures, core covariates, and polygenic risk scores (PRS). Performance was evaluated on a held-out test set using AUROC, PR-AUC, and calibration metrics, and compared with single-task logistic regression (LR). Feature importance was assessed using permutation importance and compared with Cox proportional hazards models. Results: MTL showed comparable or improved performance relative to LR, with larger gains for low-prevalence outcomes (cannabis and nicotine). Incorporating longitudinal information led to consistent improvements across all outcomes. Dynamic models increased AUROC by +0.044 to +0.062 for MTL and +0.050 to +0.084 for LR, indicating that temporal information was the primary driver of performance gains. Feature importance analyses showed modest overlap across methods, with higher agreement between dynamic MTL and Cox models than static MTL. A small set of features, including externalizing behavior, parental monitoring, and developmental factors, were consistently identified across all approaches. Conclusions: Dynamic multi-task learning improves the prediction of substance use initiation by leveraging longitudinal structure and shared information across outcomes. While MTL provides additional gains, incorporating time-varying information is the dominant factor for improving performance. Combining baseline and dynamic frameworks offers a comprehensive strategy for identifying robust risk factors and modeling adolescent substance use initiation.

14
A high-throughput Epstein-Barr virus nuclear antigen 1 (EBNA1) serology test strip for nasopharyngeal carcinoma risk screening

Warner, B. E.; Patel, J.; Satterwhite, R.; Wang, R.; Adams-Haduch, J.; Koh, W.-P.; Yuan, J.-M.; Shair, K. H. Y.

2026-04-13 infectious diseases 10.64898/2026.04.08.26350329 medRxiv
Top 9%
0.4%
Show abstract

PurposeAntibodies to Epstein-Barr virus (EBV) proteins can predict nasopharyngeal carcinoma (NPC) risk. We previously defined a prototype EBNA1 protein panel and multiplex immunoblot assay that distinguishes NPC risk several years pre-diagnosis. Assay throughput and specificity are critical to effectively implement a population-level screening program. Here, we developed a strip test assay - EBNA1 SeroStrip-HT - with an objective to increase throughput and maximize specificity. Experimental DesignEBNA1 full-length (FL) and glycine-alanine repeat deletion mutants (dGAr) were purified from insect and mammalian cells to screen serum IgA/IgG from prospective cohorts in Singapore and Shanghai, China, with known time intervals to NPC diagnosis. Twenty pre-diagnostic sera within 4 years to diagnosis were compared to 96 healthy controls using a nested case-control study design. ResultsIgA to mammalian-derived EBNA1 dGAr achieved 85.0% sensitivity and 94.8% specificity (AUC, 0.939) for NPC status. IgA to insect-derived EBNA1 dGAr showed the same sensitivity (85.0%) and similar specificity (93.8%) (AUC, 0.941). IgA to insect-derived EBNA1 FL had a higher 90% sensitivity, but lower 91.7% specificity (AUC, 0.940). Combining EBNA1 FL and dGAr results showed that subjects positive for both proteins had a 243.67 odds ratio for NPC incidence compared to double-negative scores. ConclusionThis study demonstrated the efficacy of EBNA1 SeroStrip-HT for NPC risk assessment and stratification in high- and intermediate-risk populations, yielding high accuracy and a 12-fold increased throughput over the prototype. The insect system was appropriate for large-scale production of purified EBNA1. Larger, geographically diverse cohorts are warranted to confirm these results, especially in low-incidence populations.

15
Analytical Choices Impact the Estimation of Rhythmic and Arrhythmic Components of Brain Activity

da Silva Castanheira, J.; Landry, M.; Fleming, S. M.

2026-04-11 neuroscience 10.1101/2025.09.24.678322 medRxiv
Top 10%
0.4%
Show abstract

Brain activity comprises both rhythmic (periodic) and arrhythmic (aperiodic) components. These signal elements vary across healthy aging, and disease, and may make distinct contributions to conscious perception. Despite pioneering techniques to parameterize rhythmic and arrhythmic neural components based on power spectra, the methodology for quantifying rhythmic activity remains in its infancy. Previous work has relied on parametric estimates of rhythmic power extracted from specparam, or estimates of rhythmic power obtained after detrending neural spectra. Variation in analytical choices for isolating brain rhythms from background arrhythmic activity makes interpreting findings across studies difficult. Whether these current approaches can accurately recover the independent contribution of these neural signal elements remains to be established. Here, using simulation and parameter recovery approaches, we show that power estimates obtained from detrended spectra conflate these two neurophysiological components, yielding spurious correlations between spectral model parameters. In contrast, modelled rhythmic power obtained from specparam, which detrends the power spectra and parametrizes brain rhythms, independently recovers the rhythmic and arrhythmic components in simulated neural time series, minimising spurious relationships. We validate these methods using resting-state recordings from a large cohort. Based on our findings, we recommend modelled rhythmic power estimates from specparam for the robust independent quantification of rhythmic and arrhythmic signal components for cognitive neuroscience.

16
AENEAS Project: First real-time intraoperative application of machine vision-based anatomical guidance in neurosurgery

Sarwin, G.; Ricciuti, V.; Staartjes, V. E.; Carretta, A.; Daher, N.; Li, Z.; Regli, L.; Mazzatenta, D.; Zoli, M.; Seungjun, R.; Konukoglu, E.; Serra, C.

2026-04-11 surgery 10.64898/2026.04.09.26348607 medRxiv
Top 10%
0.4%
Show abstract

Background and Objectives: We report the first intraoperative deployment of a real-time machine vision system in neurosurgery, derived from our previous anatomical detection work, automatically identifying structures during endoscopic endonasal surgery. Existing systems demonstrate promising performance in offline anatomical recognition, yet so far none have been implemented during live operations. Methods: A real-time anatomy detection model was trained using the YOLOv8 architecture (Ultralytics). Following training completion in the PyTorch environment, the model was exported to ONNX format and further optimized using the NVIDIA TensorRT engine. Deployment was carried out using the NVIDIA Holoscan SDK, the system ran on an NVIDIA Clara AGX developer kit. We used the model for real-time recognition of intraoperative anatomical structures and compared it with the same video labelled manually as reference. Model performance was reported using the average precision at an intersection-over-union threshold of 0.5 (AP50). Furthermore, end-to-end delay from frame acquisition to the display of the annotated output was measured. Results: A mean AP50 of 0.56 was achieved. The model demonstrated reliable detection of the most relevant landmarks in the transsphenoidal corridor. The mean end-to-end latency of the model was 47.81 ms (median 46.57 ms). Conclusion: For the first time, we demonstrate that clinical-grade, real-time machine-vision assistance during neurosurgery is feasible and can provide continuous, automated anatomical guidance from the surgical field. This approach may enhance intraoperative orientation, reduce cognitive load, and offer a powerful tool for surgical training. These findings represent an initial step toward integrating real-time AI support into routine neurosurgical workflows.

17
Cross-cultural adaptation and psychometric validation of the ISBAR Structured Handover Observation Tool in ICU-to-ward patient transfer

Ni, N.; Zhao, B.; Wang, Y.; Wang, Q.; Ding, J.; Liu, T.

2026-04-14 nursing 10.64898/2026.04.10.26350669 medRxiv
Top 10%
0.4%
Show abstract

Abstract The ISBAR framework is used to standardize clinical handovers and enhance patient safety. Observational tools based on ISBAR have been developed to assess the completeness of information transfer. However, these instruments have primarily been developed in non-Chinese contexts, and validated Chinese-language observational tools suitable for clinical practice remain limited. In this study, a cross-cultural adaptation and psychometric validation of the ISBAR Structured Handover Observation Tool was conducted, examining its reliability and discriminant validity in Chinese clinical settings. The study was conducted in two phases: cross-cultural adaptation and psychometric evaluation in real-world clinical settings. Content validity was assessed using the Content Validity Index (CVI), and inter-rater reliability was evaluated using the Intraclass Correlation Coefficient (ICC) based on a two-way mixed-effects model with absolute agreement. Discriminant validity was examined using the Mann-Whitney U test to compare scores across nurses with varying levels of clinical experience. A total of 233 handover cases involving patient transfers from the intensive care unit (ICU) to general wards were collected, involving 84 nurses. The scale demonstrated good content validity, with item-level content validity indices (CVI) ranging from 0.88 to 1.00 and a scale-level CVI/Ave of 0.98. The inter-rater reliability, assessed using fifty randomly selected cases, was high, with an intraclass correlation coefficient (ICC) of 0.885 for single-rater assessments and 0.939 for average-rater assessments. Discriminant validity analysis showed that nurses with more clinical experience had significantly higher total scores than those with less experience (Z = -4.772, p < 0.001). The Chinese version of the ISBAR Structured Handover Observation Tool demonstrates good content validity, high inter-rater reliability, and acceptable discriminant validity. This tool provides a standardized and practical method for assessing the completeness of information transfer and is expected to support quality improvement in patient handover from the ICU to general wards in Chinese clinical settings.

18
A safer fluorescent in situ hybridization protocol for cryosections

Chihara, A.; Mizuno, R.; Kagawa, N.; Takayama, A.; Okumura, A.; Suzuki, M.; Shibata, Y.; Mochii, M.; Ohuchi, H.; Sato, K.; Suzuki, K.-i. T.

2026-04-16 molecular biology 10.1101/2025.05.25.655994 medRxiv
Top 10%
0.3%
Show abstract

Fluorescent in situ hybridization (FISH) enables highly sensitive, high-resolution detection of gene transcripts. Moreover, by employing multiple probes, this technique allows for multiplexed, simultaneous detection of distinct gene expression patterns spatiotemporally, making it a valuable spatial transcriptomics approach. Owing to these advantages, FISH techniques are rapidly being adopted across diverse areas of basic biology. However, conventional protocols often rely on volatile, toxic reagents such as formalin or methanol, posing potential health risks to researchers. Here, we present a safer protocol that replaces these chemicals with low-toxicity alternatives, without compromising the high detection sensitivity of FISH. We validated this protocol using both in situ hybridization chain reaction (HCR) and signal amplification by exchange reaction (SABER)-FISH in frozen sections of various model organisms, including mouse (Mus musculus), amphibians (Xenopus laevis and Pleurodeles waltl), and medaka (Oryzias latipes). Our results demonstrate successful multiplexed detection of morphogenetic and cell-type marker genes in these model animals using this safer protocol. The protocol has the additional advantage of requiring no proteolytic enzyme treatment, thus preserving tissue integrity. Furthermore, we show that this protocol is fully compatible with EGFP immunostaining, allowing for the simultaneous detection of mRNAs and reporter proteins in transgenic animals. This protocol retains the benefits of highly sensitive, multiplexed, and multimodal detection afforded by integrating in situ HCR and SABER-FISH with immunohistochemistry, while providing a safer option for researchers, thereby offering a valuable tool for basic biology.

19
Virtual Spectral Decomposition with Dendritic Tile Selection: An Explainable AI Framework for Multimodal Tissue Composition Analysis and Immune Phenotyping Across Pancreatic, Lung, and Breast Cancer

Chandra, S.

2026-04-13 oncology 10.64898/2026.04.11.26350689 medRxiv
Top 10%
0.3%
Show abstract

Background: Current deep learning models in computational pathology, radiology, and digital pathology produce opaque predictions that lack the explainable artificial intelligence (xAI) capabilities required for clinical adoption. Despite achieving radiologist-level performance in tasks from whole-slide image (WSI) classification to mammographic screening, these models function as black boxes: clinicians cannot trace predictions to specific biological features, verify outputs against established morphological criteria, or integrate AI reasoning into precision oncology workflows and tumor board decision-making. Methods: We present Virtual Spectral Decomposition (VSD), a modality-agnostic, interpretable-by-design framework that decomposes medical images into six biologically interpretable tissue composition channels using sigmoid threshold functions - the same mathematical structure as CT windowing. Unlike post-hoc xAI methods (Grad-CAM, SHAP, LIME) applied to black-box deep learning models, VSD channels have pre-defined biological meanings derived from tissue physics, providing inherent explainability without sacrificing quantitative rigor. For whole-slide image (WSI) analysis in digital pathology, we introduce the dendritic tile selection algorithm, a biologically-inspired hierarchical architecture achieving 70-80% computational reduction while preferentially sampling the tumor immune microenvironment. VSD is validated across three cancer types and imaging modalities: pancreatic ductal adenocarcinoma (PDAC) on CT imaging, lung adenocarcinoma (LUAD) on H&E-stained pathology slides using TCGA data, and breast cancer on screening mammography. Composition entropy of the six-channel vector is computed as a visual Biological Entropy Index (vBEI) - an imaging biomarker quantifying the diversity of active biological defense systems. Results: In pancreatic cancer, the fat-to-stroma ratio (a novel CT-derived radiomics biomarker) declines from >5.0 (normal) to <0.5 (advanced PDAC), enabling early detection of desmoplastic invasion before mass formation on standard imaging. In lung cancer, composition entropy from H&E whole-slide images correlates with tumor immune microenvironment markers from RNA-seq (CD3: rho=+0.57, p=0.009; CD8: rho=+0.54, p=0.015; PD-1: rho=+0.54, p=0.013) and predicts overall survival (low entropy immune-desert phenotype: 71% mortality vs 29%, p=0.032; n=20 TCGA-LUAD), providing immune phenotyping for checkpoint immunotherapy patient selection from a $5 H&E slide without molecular assays. In breast cancer, each lesion type produces a characteristic six-channel fingerprint functioning as an interpretable computer-aided diagnosis (CAD) system for quantitative BI-RADS assessment and subtype classification (IDC vs ILC vs DCIS vs IBC). A five-level xAI audit trail provides complete traceability from clinical decision support output to specific biological structures visible on the original images. Conclusion: VSD establishes a unified, interpretable-by-design mathematical framework for explainable tissue composition analysis across imaging modalities and cancer types. Unlike black-box deep learning and post-hoc xAI approaches, VSD provides inherently interpretable, clinically verifiable cancer detection and immune phenotyping from standard clinical imaging at existing costs - without requiring foundation model infrastructure, specialized hardware, or molecular assays. The open-source pipeline (Google Colab, Supplementary Material) enables immediate reproducibility and extension to additional cancer types across the pan-cancer TCGA atlas.

20
VAE (Variational Autoencoder) Based Gastrotype Identification and Predictive Diagnosis of Helicobacter pylori Infection

Ma, Z.; Qiao, Y.

2026-04-13 gastroenterology 10.64898/2026.04.11.26350690 medRxiv
Top 10%
0.3%
Show abstract

Background: The enterotype concept proposed that gut microbiomes cluster into discrete types, but subsequent critiques demonstrated that such clustering depends on methodological choices, that the number of clusters is not fixed, and that faecal samples cannot capture spatial heterogeneity along the gastrointestinal tract. The stomach remains particularly understudied, and no systematic classification exists for gastric microbial community types. Methods: We assembled a multi-cohort dataset of 566 gastric mucosal samples spanning healthy controls to gastric cancer, with both Helicobacter pylori (HP)-negative and HP-positive individuals. Critically, we applied the key methodological lessons of the enterotype debate: we used a variational autoencoder (VAE) for dimensionality reduction to learn a continuous latent representation without forcing discrete structure, determined the optimal number of clusters using the Silhouette index (an absolute validation measure) across K=2 to K=10 rather than arbitrarily selecting a cluster number, and performed transparent evaluation of multiple clustering solutions. This VAE-plus-silhouette workflow directly addresses the critiques leveled against the original enterotype analysis. Results: Four gastotypes were identified, with K=4 achieving the highest mean silhouette score, indicating good cluster cohesion and separation. Two gastotypes (Variovorax-type and Trabulsiella-type) were significantly enriched in HP-positive samples, while two gastotypes (Bacteroides-type and Streptococcus-type) were significantly enriched in HP-negative samples. Random Forest and Gradient Boosting achieved excellent baseline performance for predicting HP infection (AUC = 0.990 and 0.993). Conclusions: The VAE-plus-silhouette workflow provides a robust, data-driven approach for identifying gastotypes without forcing discrete structure or arbitrarily fixing cluster numbers. Using this framework, we identified four gastotypes with significantly different HP infection rates. Variovorax-type and Trabulsiella-type showed strong HP-positive enrichment, while Bacteroides-type and Streptococcus-type showed strong HP-negative enrichment. These findings demonstrate that methodological advances from the enterotype controversy can be successfully transferred to the stomach, offering a reproducible taxonomy for stratifying HP infection status with potential clinical utility.